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^ ! Abstract 

^ • The total variation-based image denoising model has been generalized and 

, O, ' extended in numerous ways, improving its performance in different contexts. 

We propose a new penalty function motivated by the recent progress in the 

^ i statistical literature on high- dimensional variable selection. Using a par- 

I ticular instantiation of the majorization-minimization algorithm, the opti- 

^ ■ mization problem can be efficiently solved and the computational procedure 

Q ■ realized is similar to the spatially adaptive total variation model. Our two- 

\^ . pixel image model shows theoretically that the new penalty function solves 

O I the bias problem inherent in the total variation model. The superior perfor- 

^ I mance of the new penalty is demonstrated through several experiments. Our 

■ investigation is limited to "blocky" images which have small total variation. 

^ . Key words: 
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1. Introduction 

Denoising is probably the most common and most studied problem in 
image processing. Approaches developed so far include many methods aris- 
ing from the field of engineering, computer science, statistics and applied 
mathematics. There are several popular classes of existing denoising al- 
gorithms, from simple linear neighborhood filtering to complicated wavelet 
method based on solid statistical foundation [1, 2, 3]. PDE-based method 
proposed first in [4] is unique in its formulation of images as functions in 
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a suitable function space. Relatively few comparison studies exist among 
different methods, which is quite understandable due to (1) there are a large 
number of existing denoising approach with many different modifications and 
extensions; (2) the success or failure of different approaches depend largely 
on the characteristics exhibited by different types of images, whether car- 
toon or natural scene images, grayscale or colored, textured or solid objects. 
One exception is the work [5] which compared the standard total variation 
(TV) model with wavelet denoising and find TV is inferior for some standard 
test images. With different fine tuning and extensions available in both the 
class of PDE-based and wavelet-based methods, such as using higher order 
derivatives or correlated wavelet coefficients, it is still hard to judge from 
their results the relative merits of these two approaches, although it seems 
to be the prevaihng mindset that the wavelet-based methods works better 
for general images. 

Denoting the unobserved original noiseless image by u, the goal of denois- 
ing is to recover this original image given an observed noisy image f — u + n, 
where n denotes the noise. In traditional filtering as well as wavelet-based ap- 
proaches, we either think of images as m x n matrices or N = mn-dimensional 
vectors, while the PDE-based method will generally treat images as bivariate 
functions defined on the unit square Q = [0, 1] x [0, 1]. Introduced in [4], the 
standard total variation (TV) image denoising method estimates the original 
image by solving the following minimization problem 

■u = argmin 11/ — -|- ATV^(m), (1) 

u 

where is the L2 norm of the function and TV{u) — |Vw| is the total 
variation norm of u [4] . The regularization parameter A controls the trade- 
off between the fidelity to observed image and smoothness of the recovered 
image. Actually the paper [4] used the somewhat equivalent formulation of 
minimizing the total variation with constraints on the noise level, which is 
assumed to be known. But the penalized L2 version stated above is more 
convenient when the level of the noise is unknown and we will adopt this for- 
mulation in our study. Both practically and theoretically, this model is the 
best understood one in PDE-based methods as of today, where the images 
are considered as belonging to the space of functions of bounded variation 
(BV) and the existence and uniqueness of solution is well-estabhshed [6, 7, 8]. 
Discrete version of the TV model is considered in [5], arguing that all ap- 
proaches have to go through the discretization procedure when implemented 
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anjrway. Our point of view is that using either the continuous or discrete 
formulation for the PDE-based method makes httle difference in practice. 

Ahhough the standard TV model above might not be competitive for 
general image denoising tasks, it is believed to be ideal for blocky images, 
i.e., images that are nearly piece-wise constant. Prom a statistical point of 
view, this can be simply seen by the fact that it penalizes the first partial 
derivative (or, in discrete version, first order differences) and thus shrinks 
them to zero. [9] noted the inherent bias in TV model and proposed the 
spatially adaptive total variation (SATV) model that applies less smoothing 
near significant edges by utilizing a spatially varying weight function that is 
inversely proportional to the magnitude of image derivatives. SATV is a two- 
step procedure where the weight function obtained from the first step using 
standard TV is then used to guide smoothing in the second step. The authors 
showed that with a modest increase in computation, SATV is superior to 
standard TV in restoring piece-wise constant image features. 

Curiously, there is an almost parallel development in the statistical lit- 
erature in the context of high-dimensional linear regression with variable 
selection. As explained in the next section, these studies focus on the regres- 
sion problem where although there exists a priori numerous covariates, most 
of the regression coefficients are exactly zero, implying that the correspond- 
ing covariates have no effects on the response variable. Thus shrinking most 
regression coefficients to zero is a viable strategy for efficient estimation. For 
piece-wise constant images, with first derivatives in most locations exactly 
equal to zero, shrinking them to zero is thus also a reasonable approach. Tak- 
ing advantage of this observation, we propose to adapt the smoothly clipped 
absolute deviation (SCAD) penalty [10, 11] that has become extremely pop- 
ular in the statistical community for our image denoising task. Although in 
the case of TV model the correspondence between the functional-analytical 
approach and the statistical approach seems to be well-known, and some have 
studied in detail the properties of total variation from a statistical point of 
view [12, 13], these statistical works are only restricted to the one-dimensional 
case. Besides, as far as we know the parallelism stated above has not been 
fully utilized and in particular the SCAD penalty has not been applied to 
penalize the first order differences even in the one-dimensional case. Besides 
its superior performance in practice, there are several advantages of SCAD 
penalty compared to SATV, most notably getting rid of the extra parame- 
ter that a user needs to tune for SATV in implementation. As mentioned 
before, we think either discrete or continuous formulation formally makes 
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little difference, but we choose to use the continuous formulation since it can 
simplify description and notation significantly. The only problem is that the 
functional using SCAD penalty being nonconvex, existence of solution is not 
guaranteed. The theoretically inclined reader might want to think in dis- 
crete terms so that such technical point does not arise. Our computational 
experiments show that SCAD is superior to SATV in terms of mean square 
error (MSE). Although MSE is notorious for describing the visual quality of 
an image, it is arguably less so for blocky images where MSE can describe 
the accuracy of restoration rather faithfully. 

The rest of the paper is organized as follows. In the next section, we 
briefly review the TV and the SATV model and point out the almost trivial 
connection to Lasso and the adaptive Lasso developed in the statistical liter- 
ature so that we hope readers from both fields can follow the motivation and 
development of the current paper. In Section 3, we adapt the SCAD penalty 
for our image denoising problem and discuss some properties in detail in 
this context. We also developed a majorization-minimization procedure us- 
ing first order Taylor expansion so that the computation involved simply 
reduces to that similar to the SATV model, although with a different weight 
function. In Section 4, we will briefly review a method called Monte-Carlo 
SURE [14] for regularization parameter selection which is used in our study 
when required. In Section 5, several computational experiments are used to 
show the superiority of the proposed method in denoising blocky images. In 
these experiments, we also intentionally emphasize the difficulty encountered 
with SATV model in tuning its performance. We conclude the paper with a 
discussion in Section 6. 

2. Review of the TV and SATV model 

The TV model proposed by [4] and presented above in equation (1) has 
received a great deal of attention in the last decade. In [9], the authors 
argued that it is desirable that less smoothing is carried out where there is 
more feature in the image. This motivated the replacement of TV norm by 
the following more general weighted TV functional 
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The weight should be small in the presence of an edge so that less smoothing 
is performed near an edge. [9] used a weight function inversely proportional 
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to the derivative, with a parameter e added both to avoid dividing by zero 
and to be used as a tuning parameter to control the amount of adaptivity. 
Thus in their proposal of the spatially adaptive total variation (SATV) model 
w = l/{ux + e) + l/{uy + e) where Ux and Uy are the partial derivatives. [9] 
used a two-step method. In the first step the standard TV model (1) is used 
to estimate u based on which the partial derivatives (first order differences) 
are computed. Then the derivatives are used in (2) to compute the final 
restored image. If e is chosen sufficiently large, SATV basically reduces to 
the standard TV. On the other hand, if e is too small, artificial edges will 
appear and the algorithm will be numerically unstable as well. We will see in 
our simulations that the result is somewhat sensitive to the choice of e and 
the appropriate amount of adaptivity is not universal to all images, which 
makes it difficult to choose e in practice, or leads to a sizable increase of the 
amount of computation required to say the least. 

As we mentioned in the introduction, there is an almost parallel line of 
development in the statistical hterature that uses the same idea of SATV 
in a different context. In a linear regression problem yi = (3 + ei based 
on independent and identically distributed (i.i.d.) data (yj,Xj)"^^, where 

= (xji, . . . , XipY ^-re the covariates, /3 = . . . , Pp)^ are the regression 
coefficients, and is a zero mean noise. Sometimes one has good reasons to 
believe that only a few of the related to ^j, i.e., many of the /3g's are 

exactly zero. In these situations it is desirable to design an approach that 
shrinks many regression coefficients to zero automatically. Lasso [15] does 
exactly that and is formulated as the minimization of the following objective 
function: 

n p 
i=l i=l 

It is now well-known that this algorithm encourages many coefficients to be 

exactly zero as desired due to the use of Li norm penalty for f3. [16] later 
proposes the adaptive Lasso, which possesses better theoretical properties 
than Lasso and also proves to be superior in practice, that solves the following 
minimization problem 

n p 

^||y,-Xi^/3[|^ + A^|A|/|A|, 

i=l 1=1 

where /3 = . . . , /3p} is the standard least square estimate. Any other 
reasonable estimate can be used (to be more rigorous, (5 must be consistent 
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in statistical terms in order to enjoy the theoretical properties stated in that 
paper) . 

The reader can immediately see the parallel developments in statistics 
and TV-based image processing. When it is desirable to shrink the first order 
differences in an image towards zero, the same arguments that lead to Lasso 
and adaptive Lasso now assume the form of TV and SATV respectively. In 
the statistical literature, [12, 13] studied the TV problem in its discrete form, 
but we have not seen any mention of utilizing adaptive Lasso to penalize the 
first order differences. 

Historically, before the appearance of adaptive Lasso, to address the 
shortcomings of Lasso (which is not consistent in variable selection), [10] 
proposed the smoothly clipped absolute deviation (SCAD) penalty which is 
motivated by the desire to achieve several desirable properties of the estima- 
tor such as continuity, asymptotic unbiasedness, etc. They also show that the 
resulting estimator possesses the so-called oracle property, i.e. it is consis- 
tent for variable selection and behaves the same as when the zero coefficients 
are known in advance. In the next section, we adapt the SCAD penalty for 
image processing tasks. Using SCAD penalty gets rid of the clumsiness of 
having to choose the parameter e in SATV and our experiments show its 
performance is superior to SATV. 



3. Image Denoising with the SCAD penalty 

In linear regression, using the SCAD penalty amounts to minimizing the 
following functional 

n p 

^||y,-xr/3|r + ;^p,(|A|), (3) 
i=i 1=1 

where px{-) is more conveniently defined by its derivative 

p',{9) = A |/(^ < A) + ^-^^^^1(0 > A)| , for ^ > 0, 

and Pa(0) = 0. As usual, a = 3.7 is used. 

We plot the function px in Fig 1(a) for A = 1 and its derivatives in Fig 
1(b). As seen in (3) we only use px and its derivative with a nonnegative 
functional argument. We plot both in Fig 1 as even functions for convenience, 
although the derivative should be an odd function if px is defined as an even 
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function. Note that this penalty function, unUke the Li penalty used in 
Lasso, is not convex. To use the SCAD penalty for image denoising, we 
formally write down the functional 



\\f-u\\'+ [ pxi\Vu\). (4) 
Jn 

Some readers will have the objection that p\ is nonconvex and thus the exis- 
tence of solution to the above functional is in question. Even the definition of 
PAdVul) seems to be a difficult task, if not impossible. Note [7] only defined 
0(|Vm|) when is convex and m is a BV function. Due to this problem we 
encourage the reader to change to a discrete formulation which is straightfor- 
ward from (4) . The expression (4) in the continuous form is so much cleaner 
so we prefer to keep it. This should hopefully be just a minor nuisance for 
practitioners. 

To see clearly the effect of SCAD compared to TV, we consider the fol- 
lowing simple discrete problem instead, 

argmin {y^ - O^f + {y^ - ^2)' +Pa(|^i - ^2!), (5) 

i.e., we consider an "image" with only two pixels. We have the following 
property of the minimizer comparing SCAD penalty and TV penalty, the 
proof is deferred to the appendix: 

Proposition 1. Suppose without loss of generality that yi >y2- 
(a) If yi — y2 > a\, the minimizer of (5) is ^1 = ^2 = Z/2- 
(h) Ifyi -y2 < min^gj^d^l +p'x{\C\), the minimizer of (5) is Oi ^ 02 ^ 

{yi + y2)/2. 

If instead the TV norm is used, i.e. px{\Oi — 92\) is replaced by X\9i — 62] 

in (5), then 

(c) if yi — y2 > X, the minimizer is 9i = yi — A/2, 6*2 = 2/2 + A/2. 

(d) if yi — ?/2 < A, the minimizer is 61 — 62 — {yi + ?/2)/2. 

From the proposition, we see that for this simple two-pixel image model, 
although both penalties have the effect of shrinking 61 and 62 to be exactly 
equal to each other, the SCAD penalty has the additional desired property 
that when the difference \yi — y2\ is large enough, no shrinkage is applied. 
From part (c) of the proposition the TV model is implicitly biased, which is 
already known in more general contexts as shown in [17, 18] . Our experiments 
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later also demonstrated this effect. From the proof in the Appendix it can be 
seen that this difference arises basically from the fact that p^(^) = when 9 
is big enough. 

Compared to TV or SATV, optimization of the functional (4) is more 
complicated since the functional is nonconvex and using time evolution of the 
corresponding Euler-Lagrange equation (i.e., gradient descent) is potentially 
problematic. Thus we use the following majorization-minimization (MM) 
algorithm instead. Note that [5] also proposed an MM algorithm for standard 
TV image denoising. 

First, we majorize the SCAD penalty function using its first order Taylor 
expansion using an initial estimated image u^^"^ (we could simply set u^^^ — f 
for example): 

Pa(|V«|) < Pa(|V«(°)|) +p',(|V«|)(|V«| - |V«(°)|), 

which is illustrated in Fig 1(a) as the dotted line. Using this approximation, 
we can repeatedly solve the problem: 

u'^''^ = arg min 1 1 / -M 1 1 p;, ( I Vu^''-^^ \ ) +p'a ( I Vxi^^^-'^ | ) ( | Vm | - 1 Vm^'^"^) | ) , A; = 1 , 2 , . . . , i^, 

i.e., replacing the SCAD penalty by its upper bound and then solving the 
new optimization problem. Getting rid of terms that are independent of u, 
we are actually minimizing the following functional 

^ argmin||/-'u||2 + J p'^{\Vu^''-'^^\)\Vu\, k = 1,2, . . . , K, (6) 

which is in the same form as the functional with SATV penalty (2) with 
a weight function w — p';^(|Vii*^'^~^^|) that is different for each iteration k. 
Thus the computation involved is almost identical to SATV, with an extra 
outer loop that modifies the weight function in each iteration. Formally, each 
inner loop will use the evolutionary PDE derived from the Euler-Lagrange 
equation to solve (6): 

^* = V•{(p',(|V^^M|)^}-(^^-/). 

From this analogy with SATV, we can also see the advantage of SCAD from 
another point of view: the weight function w — p\{\Wu^''~^^\) is bounded and 
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thus there is no stabihty problem as when w is inversely proportional to the 
first derivative, which makes an extra tuning parameter e unnecessary in the 
SCAD model. 

From the general property of the MM algorithm [19, 5], the algorithm 
produces a sequence of monotonically decreasing values of the objective func- 
tional (4) which makes the algorithm very stable. In practice for our exper- 
iments, we find that the number of iterations K can be taken as small as 
K = 2, thus the running time of the algorithm is comparable to both stan- 
dard TV and SATV. 

4. Monte-Carlo SURE for Regular izat ion Parameter Selection 

In all the above methods the value of the regularization parameter chosen 
largely determines the quahty of the denoised image. We use MSE as the 
criterion for judging the relative merits of different methods in this paper, 
which is defined by 

where we take the original image w as a A'^-dimensional vector and u is the 
restored image. Note that it is necessary to consider discrete formulation in 
this section. To calculate MSE we need to have the prior knowledge of the 
noise-free image which in most realistic scenarios is unavailable. When the 
noise is Gaussian, [14] proposed a technique called Monte-Carlo SURE, which 
does not require any prior knowledge of the noise-free image or the nature 
of the denoising algorithm. For the purpose of presenting this method, we 
now should change to a discrete formulation. For a noisy image f = u + n, 
formulated in the discrete domain, and a denoising algorithm considered 
abstractly as a mapping il — M{f) that returns a restored image u with / 
as input, [14] proved that 

1 2fT2 

-\\f-M{fW-a' + —divjM{f) (7) 

is an unbiased estimator of the true MSE, where a is the standard deviation 
of the Gaussian noise and divfM{f) is the divergence of the multivariate 
function M. Note in our context the mapping M implicitly depends on the 
regularization parameter A. Direct calculation of diVfM{f) is not feasible 
except for simple hnear filtering operation, and [14] used the Monte Carlo 
approximation 

diVfMif) ^ b^(M(/ + eb) - M(/)), 
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where b is a N-dimensional vector with i.i.d. standard normal random com- 
ponents, and e is a small positive constant. That is, we artificially add more 
noise to the observed image and run the same denoising algorithm again and 
then approximate the divergence based on the differences of the two recov- 
ered images. We will use Monte-Carlo SURE to choose the regularization 
parameter whenever required in the next section. Since the noise level is 
assumed to be unknown in our experiments, some pilot estimate of a should 
be plugged into equation (7). In all our experiments, we used the following 
simple estimate that is quite robust empirically for blocky images: 

o- = median{|/i-/j|}/0.954, (8) 

where / = (/i, . . . , /iv) is the observed image and the differences fi — fj 
are taken over all neighboring pixels (four neighbors for each pixel). This 
estimate is based on the fact that with a normal random variable X ~ 
Ar(0,2(72), median{\X\) « 0.954(7. 

5. Experiments 

First we compare the performance of the three approaches TV, SATV, 
and SCAD using a simple black-and-white image shown in Fig 2(a). In this 
first experiment, we do not choose any single regularization parameter but 
compare the performance over a whole wide range of regularization parame- 
ters. Independent Gaussian noise with standard deviations a — 10, 20 and 40 
are added to the original image and taken as the observed noisy input. For 
the initial step of SATV, we use TV with optimal parameter A to estimate 
the weight function. We also search for a good value of e in the second step 
(based on minimization of the true MSE) for e e {1, 10, 100, 500}, it turns 
out for all three different noise levels for this image e = 10 gives the best 
result. Note that we consider the intensity values of an image to be in the 
range of [0,255]. Both choices actually make the results more favorable for 
SATV, but we will see that even so it is being outperformed by SCAD. Fig 3 
shows the evolution of the true MSE using different regularization parame- 
ters for the three methods, with different subfigures illustrating the observed 
image with different noise levels. From these figures, it is clearly seen that 
SCAD performs better than SATV, while both arc significantly better than 
TV. To get some insights into the effect of the different penalties, the im- 
age histograms for the recovered images are shown in Fig 4 for the case 
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of (T = 20. One can see from the histograms of the TV-based restoration 
that the TV estimate is biased, in that black colored pixel intensities (with 
original intensity value of zero) are generally shifted up while white colored 
pixel intensities (with original intensity of 255) are shifted down, consistent 
with the proposition stated previously. While SATV only partially addresses 
this, SCAD seems to be more efficient in solving this bias problem. Besides, 
Fig 4(c) demonstrates that for the recovered image using the SCAD penalty, 
the histogram is more peaked and thus resulting in smaller MSE. Using this 
experiment, we can also see the effect of e on the result. As stated above 
e = 10 is optimal for SATV for this image. We see from Fig 5 that using 
e = 1 or e = 100 makes the MSE bigger. Specifically, using e — 1 enlarged 
the minimum MSE from 51.40 to 68.21, or by 34%, while using e = 100 en- 
larged MSE by 13%. Unfortunately there is no universally best value for e, 
and our later experiments demonstrate that for different images the optimal 
e is difficult to predict. Choosing a wrong value for e makes the performance 
of SATV more unpredictable. Although e could be selected by similar meth- 
ods that have been developed for selecting A, for example using Monte-Carlo 
SURE, this at least increases significantly the computational burden of the 
algorithm. And even with a good estimate of e, our result here shows that 
it is still worse than SCAD in terms of the MSE criterion. 

Our second experiment uses images as shown in Fig 2 (b) and (c). The 
former is still a black-and-white image with thicker nested squares. The latter 
is an image similar in structure to Fig 2(a) but with different grayscale levels 
and also rotated by 45° degrees. Image Fig 2(b) is clearly easier to denoise 
due to the larger scale of its features, thus we choose to add Gaussian noise 
with standard deviations a — 20, 40, 80. For image (c) we use four different 
levels a = 10, 20, 40, 80. The regularization parameters now are selected 
using Monte-Carlo SURE as briefiy described previously with cr assumed 
unknown and estimated using (8). The effectiveness of Monte-Carlo SURE 
in general has been demonstrated for some methods including TV model in 
[14]. We additionally verified its performance in our SCAD model under 
several situations and found it to be quite accurate for our proposed model. 
As an illustration, for denoising the image shown in Fig 2(b) with cr = 20, 
we demonstrate that Monte- Carlo SURE accurately predicts the true MSE 
in Fig 6. The MSE of the restoration results for the two images are shown 
in Table 1 and 2 respectively. For the SATV method, the optimal values of 
e in each situation is also indicated in the table. Note that the optimal e 
is found from the true MSE and thus the results presented is favorable for 
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the SATV method. The reader can now see that different situations require 
different choices of e and there seems to be no universal way of specifying a 
good value a priori. The conclusion is the same as before: SCAD is superior 
to SATV. 

Finally, we use some slightly more complicated images to test the per- 
formances. Amsterdam Library of Object Images (ALOI, http://staf f . 
science.uva.nl/~aloi/) is a color image collection of one-thousand small 
objects, recorded for scientific purposes. We pick four images as shown in 
Fig 7 and transform them to grayscale images, which looks close to piece- 
wise constant visually. Gaussian noises with standard deviation of 40 are 
added to each image and different methods are applied. The results in terms 
of MSE are shown in Table 3, and the method using the SCAD penalty is 
still the best even for these more complicated images. Since it is visually 
difficult to distinguish the restored images in print using different methods, 
we choose not to show the restored images here, but the images are available 
from http://? in MATLAB's .fig format. 

6. Conclusion 

In this paper, we proposed a new penalization functional for image de- 
noising. The penalty function is directly motivated by the well-known oracle 
property of the SCAD penalty from the statistical literature originally pro- 
posed for high- dimensional statistical regression problems. Using a simple 
argument in a maybe overly simplistic situation, i.e., our two-pixel image 
model (5), we show that the functional with SCAD penalty solves the bias 
problem inherent in TV regularization, which is also verified by our exper- 
imental results. Compared to spatially adaptive TV, the newly proposed 
method gets rid of the headache of choosing an extra parameter that con- 
trols the stability and adaptivity of the algorithm, and achieves better mean 
squared error at the same time. Our goal in this paper is not to propose a 
general image denoising method to compete with the state-of-the-art such as 
the wavelet-based method or the nonlocal mean [20] which has become very 
popular recently, but to show that a carefully designed penalty function can 
improve existing PDE-based approaches without extra computational bur- 
den. Due to its shrinkage to zero of the first order differences, the method 
is most suitable for recovering blocky images. One can also penalize higher 
order derivatives as has been done for TV regularization, but this is outside 
the scope of the current paper. 
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Appendix 

We only prove the proposition for parts (a) and (b), the proofs for parts 
(c) and (d) are similar and shghtly simpler. Let Q{9i, 62) — {yi — + (^2 — 
^2)^ + Pa (1^1 ~ ^al)- Obviously the minimizer satisfies 61 > 62 when yi > y2 
(otherwise exchanging the values of 61 and 62 makes the functional smaller). 
The partial derivatives are (for Oi> 62) 

II = 2{e^-yr)+p',{\e,-92\\ 

1^ = 2{e2-y2)-p'A\0^-92\). 



The complication only comes from nondifferentiability when 61 = 92- When 
constrained to 9i — 62, it is easy to see from the quadratic form of Q that 
the only potential minimizer is 61 = 62 = {yi + ?/2)/2. Meanwhile, when 
yi-y2 > aX, we have Q( (1/1+1/2 )/2, (i/i+|/2)/2) = iyi-y2f/2 > (a+l)AV2 = 
P\{\yi ~ ^2!) = Q{yi,y2)- Thus the minimizer must satisfy 9i ^ 62 and the 
functional is differentiable near the minimizer, which in turns implies that 
both partial derivatives are equal to zero. Adding and subtracting the two 
partial derivatives, we get 

^1 + ^2 = 1/1+1/2, (9) 
ei-92 = yi-y2-p'x{\ei-92\). (10) 

From (10), 9i — ^^2 is a solution to the equation x + p'x{x) = yi — y2- The 
function on the left hand side, when written down explicitly, is 

{\ + X X < \ 

^ + (1-^)^ A<a;<aA (11) 
X X > aX 

which is strictly increasing for x > and the equation x + p'x{x) = yi — yi 
obviously has a unique solution x = yi — y2 when yi — y2 > oA. Combine 
this with (9), we get 9i = yi,92 = y2, and part (a) is proved. 

For part (b), if the minimizer satisfies 9i 7^ 92 so that the minimizer is a 
stationary point, then ^1 — ^2 > is a solution to the equation x + p^(x) = 
yi — y2 by exactly the same arguments as before. From (11), it is easy to 
see that the left hand side is bounded below by A > and thus there exists 
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no solution when yi — < A, leading to a contradiction. Now with the 
constraint 9i = 62, it is immediate from the form of the functional Q{Oi, 62) 
that 6, = 62 = {yi + y2)/2. 
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noise level TV ASTV SCAD 

a = 20 31.97 24.96(e=100) 17.13 

a = 40 114.71 95.76(e=100) 92.00 

a = 80 415.11 387.10(e=100) 383.77 

Table 1: MSE of using different methods on the image shown in Fig 2(b). 



noise 


level 


TV 


ASTV 


SCAD 


a = 


10 


37.02 


34.10 (e=10) 


29.10 


a = 


20 


99.37 


92.46(e=10) 


77.39 


a = 


40 


370.68 


275.08(e=10) 


266.65 


a = 


80 


886.95 


858.32(e=100) 


805.66 



Table 2: MSE of using different methods on the image shown in Fig 2(c). 



TV ASTV SCAD 

duck 77.20 75.80 (e=100) 69.70 

person 93.22 84.89(e=100) 79.35 

board 82.58 74.95(e=100) 68.87 

fish 70.99 63.69(e=100) 55.58 

Table 3: The MSE for different methods applied to four object images obtained from 
ALOI when a = 40. 




17 




19 



2.5 



2 



1.5 



0.5 








-100 -50 50 100 150 200 250 300 350 



(a) 




-100 -50 50 100 150 200 250 300 350 -100 -50 60 100 150 200 250 300 350 



(b) (c) 

Figure 4: The histogram of restored image intensities overlaid on top of each other, (a) 
Histogram of restored image intensities obtained by SATV over that obtained by TV 
model, (b) Histogram of restored image intensities obtained by SCAD over that obtained 
by TV model, (c) Histogram of restored image intensities obtained by SCAD over that 
obtained by SATV model. 
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Figure 5: Comparison of MSE for the SATV model when different values for e are chosen, 
with noise level a ~ 20. 



21 




23 



